This Jupyter Notebook runs an AI chatbot assistant designed to interact with the iModulon database. The chatbot utilizes OpenAI's GPT-4o model to answer queries, provide information, and assist with data analysis related to iModulons. The assistant also has access to gene information from the ecocyc database.
The notebook is structured as follows:
The chatbot supports a variety of functions related to iModulons, including but not limited to:
To use the chatbot:
import os
import difflib
import traceback
import getpass
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from IPython.display import display, HTML, Markdown
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent, load_tools
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.tools import tool
from imodulon_functions import *
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")
OpenAI API Key: ········
llm = ChatOpenAI(model="gpt-4o", temperature=0)
with open("imodulon_chat_prompt.txt", "r", encoding="utf8") as file:
imodulon_chat_prompt = file.read()
tools = [
learn_about_imodulons,
find_closest_imodulon,
find_closest_gene,
find_closest_condition,
get_genes_of_imodulons,
get_condition_info,
get_gene_info,
get_imodulon_info,
plot_gene_expression,
plot_imodulon_activity,
plot_all_imodulon_activities_for_condition,
compare_gene_expression,
compare_imodulon_activities,
plot_dima,
execute_python_code,
]
llm_with_tools = llm.bind_tools(tools)
prompt = ChatPromptTemplate.from_messages([
("system", imodulon_chat_prompt),
("placeholder", "{chat_history}"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}")
])
agent = create_tool_calling_agent(llm_with_tools, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=False)
chat_history = []
display(HTML("""
<style>
.output_wrapper, .output {
height: auto !important;
max-height: 1000px;
overflow-y: auto;
}
</style>
"""))
display(Markdown("**Welcome to iModulon Chat** <br /> Type 'exit' 'quit' or 'q' to quit "))
while True:
display(Markdown("**Input:**"))
user_input = input()
if user_input.lower() in ['exit','quit','q']:
break
# Prepare the input for the agent
input_data = {
"input": user_input,
"chat_history": chat_history
}
# Run the agent
response = agent_executor.invoke(input_data)
string_response = f"**iM chat:** <br />{response['output']}"
display(Markdown(string_response))
#print(f"\n**iM chat: **{response['output']}\n")
# Update the chat history
chat_history.append(HumanMessage(content=user_input))
chat_history.append(AIMessage(content=response['output']))
Welcome to iModulon Chat
Type 'exit' 'quit' or 'q' to quit
Input:
Hello, what is an iModulon? Can you give me an example? Fetching information about imodulons
iM chat:
An iModulon is a group of genes that represents an independently modulated signal within a cell, likely controlled by the same or related regulators. These iModulons are identified using Independent Component Analysis (ICA) on transcriptomic datasets, which measure gene expression under various conditions. Each iModulon has a weight for each gene and an activity for each condition. The highly weighted genes are considered members of the iModulon, and the highly active conditions are those where the iModulon is likely performing a function.
Glycerol iModulon:
This iModulon would include genes that are upregulated when glycerol is available, enabling the cell to utilize glycerol for growth.
Input:
Tell me more about the Glycerol iModulon. When is it most active? Finding closest imodulons to ['Glycerol'] Plotting Glycerol activity
iM chat:
The Glycerol iModulon is most active under the following conditions:
The Glycerol iModulon is most active in conditions related to the presence of ethanol and copper sulfate, as well as in a condition specifically involving glycerol (minE_glyc).
Input:
Tell me more about the del_cusR_lb condition. Why do you think this iModulon is most active in that condition? Finding closest conditions to ['del_cusR_lb'] Retrieving info of ['del_cusR_lb'] conditions
iM chat:
### Condition: del_cusR_lb
The del_cusR_lb condition involves a deletion of the cusR gene in Escherichia coli K-12 MG1655 strain, grown in LB media. The cusR gene is part of the CusSR two-component system, which is involved in copper and silver resistance. The deletion of cusR could lead to a stress response or compensatory mechanisms that activate other pathways, including those involved in glycerol metabolism.
Here are some possible reasons for the high activity of the Glycerol iModulon in this condition:
These factors could contribute to the observed high activity of the Glycerol iModulon in the del_cusR_lb condition.
Input:
tell me more about the cusR gene Finding closest genes to ['cusR'] Retrieving info of ['cusR'] genes
iM chat:
### Gene: cusR
CusR is a DNA-binding transcriptional activator involved in the copper and silver efflux systems. It is part of the two-component system CusS/CusR, which responds to increases in copper concentration. Here are some key points about CusR:
Function:
Regulation:
Response to Copper:
Genomic Context:
Non-Essential Protein:
Transcriptional Regulation:
This detailed information about CusR helps understand its role in the cell's response to copper stress and its regulatory functions, which could explain why the Glycerol iModulon is highly active in the del_cusR_lb condition.
Input:
Show me the gene expression of CusR under different conditions Finding closest genes to ['cusR'] Plotting cusR expression
iM chat:
### Gene Expression of CusR Under Different Conditions
This data provides a comprehensive view of how CusR expression varies across different experimental conditions, highlighting its regulatory role and response to various genetic and environmental changes.
Input:
Compare the gene expression of CusR and the glpR gene Finding closest genes to ['cusR', 'glpR'] Plotting cusR and glpR gene expression
iM chat:
### Comparison of Gene Expression: CusR vs. GlpR

Condition: del_cusR_lb (Two Component Systems)
Condition: del_cusR_CuSO4 (Two Component Systems)
Condition: aldA_f1c2 (Minicoli)
Condition: aldA_minE (Minicoli)
Condition: minE_cyt (Minicoli)
This comparison provides insights into how the expression of CusR and GlpR varies across different experimental conditions, highlighting specific conditions where their expression levels deviate from the norm.
Input:
quit